<!doctype html>
<html lang="en">
	<head>
		<meta charset="utf-8" />
		<title>mirador: data formats</title>
		<link rel="stylesheet" type="text/css" href="css/main.css" />
	</head>

	<body>
		<div id="main-con">
			<a href="index.html">
				<div id="header-con">
					<div id="img"></div>
					<div id="title"><h1>mirador</h1></div>
				</div>
			</a>
			<div id="intro" class="basic manual">
				<h2>Data formats</h2>
			</div>
			<div id="body-con" class="basic">
				<div id="manual-contents" class="topic">
                    <h1>1. Stand-alone tables</h1>
					<p>Mirador allows the upload of Comma-Separated Values (<a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>), and Tab-Separated Values (<a href="http://en.wikipedia.org/wiki/Tab-separated_values">TSV</a>) files. Tables stored in the Open Document Spreadsheet (<a href="http://en.wikipedia.org/wiki/OpenDocument">ODS</a>) format are supported as well. When selecting these types of files, Mirador will interpret each column as a variable, and each row as an individual data record holding a value for each variable. Missing values are allowed in the table, and identified by the "?" character. When conducting analysis, Mirador will do pairwise deletion of records with missing values. That is, when generating a plot or evaluating the association between variables X and Y, records that have a missing value in either the X column or the Y column will be ignored.</p>

                    <h1>2. Data types</h1>
					<p>Each column or variable must have specific data types to be properly read by Mirador. The data types supported are the following:</p>
					<ul>
					<li><b>Category:</b> this type is used to characterize ordinal or nominal variables such as gender, martial status, etc.</li>
					<li><b>Integer:</b> whole numbers between -2^31 and 2^31 - 1 (-2,147,483,648 and 2,147,483,647)</li>
					<li><b>Long:</b> whole numbers between -2^63 and 2^63 - 1</li>
					<li><b>Float:</b> single-precision (32 bit) floating-point values</li>
					<li><b>Double:</b> double-precision (64 bit) floating-point values</li>
					<li><b>String:</b> arbitrary sequenc of characters, currently ignored</li>
					</ul>

                    <h1>2. Dictionaries</h1>
					<p>
					When loading a stand-alone table file, Mirador will try to guess the types of each column by inspecting its values. This approach works in most cases, but it might fail to assess the correct type in more ambiguous cases, such as when whole numbers are used to represent the different categories of an ordinal variable. In these cases, one can provide an auxiliary dictionary table that explicitly lists the data types for each column. This dictionary can be provided either as a csv or tsv file, and must contain two columns. The first column contains the names of all the variables in the data table, and the second contains the corresponding type, which should be among the following: int, long, float, double, or category. Mirador will automatically load any file called "dictionary.csv" or "dictionary.tsv" located alongside the main table file, and attempt to use it as its dictionary. Several examples are included with the Mirador download, and among them, the "Titanic" one shows how to define a simple dictionary file.
					</p>

					<p>
					An example of when a dictionary is useful is when one has category variables that are coded in the table. If no additional information is provided, then Mirador will display the codes when labeling the axes, which can be confusing to the user. The mapping between codes and descriptions can be indicated in a third column in the dictionary file. The Titanic example also demonstrates this technique.
 					</p>	

                    <h1>3. Codebooks</h1>
 					<p>
 					Sometimes, a less strict specification of types and code mappings could be convenient, for example when most of the types in the table can be accurately inferred from the values, with the exception of a few variables that need to be explicitly described. A codebook will allow for this, and a csv or tsv file will be read automatically if it is found inside the same folder as the main data file and named "codebook.csv" or "codebook.tsv". The "Diabetes 1999-2008" and "Indian Liver Patient" examples provided with the Mirador download uses codebook files and can serve as a reference for the time being.
					</p>

                    <h1>4. Mirador projects</h1>
					<p>
					Finally, more complex datasets often require additional information such as variable groupings, and sample weights (for faster loading times, save the tables in a binary format). To support these scenarios, Mirador offers a custom project format where all the information required to properly display the dataset (table and dictionary files, additional parameters, etc.) can be stored in a .mira file. This file can be selected from the file dialog, and Mirador will parse it and open all the associated files.
					</p>
				</div>
			</div>
			<div id="footer" class="basic">
				<p>Copyright © 2014 Fathom Information Design. All Rights Reserved.</p>
			</div>
		</div>
	</body>
</html>
